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Abstract. The aim of this paper is to prove an inequality between relative entropy 
and the sum of average conditional relative entropies of the following form: For a 
fixed probability measure on A"', {X is a finite set), and any probability measure 

= av^) on X^ 

D{p^\\qn< 

n 

Const. E lEpri (■ |U 5 • ■ ■ jU —ljU + li • ■ ■ 5 Ui) I 1^2 (' lU 5 • ■ ■ jU—+ ■ ■ ■ 5 )) , 

(*) 

where Pii-\yi,...,■ ,yn) and qi{-\xi, . .. .. . ,Xn) denote the 

local specifications for p” resp. q^, i.e., the conditional distributions of the i’th 
coordinate, given the other coordinates. The constant shall depend on the properties 
of the local specifications of q^. 

The inequality (*) is meaningful in product spaces, both in the discrete and the 
continuous case, and can be used to prove a logarithmic Sobolev inequality for q^ , 
provided uniform logarithmic Sobolev inequalities are available for 
qi{-\x\,... ,Xi-i,Xi-^-i, . . ., Xn), for all fixed f and all fixed (cci, . .., Xi-i,Xi+i,... ,Xn). 

(*) directly implies that the Gibbs sampler associated with q"^ is a contraction for 
relative entropy. 

In this paper we derive inequality (*), and thereby a logarithmic Sobolev inequal¬ 
ity, in discrete product spaces, by proving inequalities for an appropriate Wasserstein- 
like distance. 

A logarithmic Sobolev inequality is, roughly speaking, a contractivity property 
of relative entropy with respect to some Markov semigroup. It is much easier to 
prove contractivity for a distance between measures, than for relative entropy, since 
distances satisfy the triangle inequality, and for them well known linear tools, like 
estimates through matrix norms can be applied. 
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1. Introduction and statement of some results. 

Let df be a finite set, and the set of n-length sequences from X. Denote by 
V{X^) the space of probability measures on X^. For a sequence x” G X'^ we denote 
by Xi the f-th coordinate of x^. 

We consider a reference probability measure G V{X^) which will be fixed 
throughout Sections 1-3. In section 4 we still consider a fixed probability measure 
denoted by q, with some subscript. 

The aim of this paper is to prove logarithmic Sobolev inequalities for measures 
on discrete product spaces, by proving inequalities for an appropriate Wasserstein- 
like distance. A logarithmic Sobolev inequality is, roughly speaking, a contractivity 
property of relative entropy with respect to some Markov semigroup. It is much eas¬ 
ier to prove contractivity for a distance between measures, than for relative entropy, 
since a distance is symmetric and satisfies the triangle inequality. Our method shall 
be used to prove logarithmic Sobolev inequalities for measures satisfying a version 
of Dobrushin’s uniqueness condition, as well as Gibbs measures satisfying a strong 
mixing condition . 

To explain the results, we need some definitions and some notation. 

Notation. If r and s are two probability measures (on any measurable space) then 
we denote by |r — s| their variational distance: 

|r — s| = sup|r(A) — s(A)|. 

A 


Definition: W 2 distance, (c.f. [B-L-M], Theorem 8.2) 

For probability measures G V{X'^) let and represent resp. i.e., 

and are random variables with distributions C{Z'^) = and C{U'^) = 
respectively. We define 


W2(r-,0 = mm J^PrliZ, ^ U,}, 

\ i=i 

where the minimum is taken over all joint distributions tt = C{Z'^, V^) with 
marginals and s'^. 


Note that IF 2 is a distance on P(A’”), but it cannot be defined by taking the 
minimum expectation of a distance (or some power of a distance) on X'^. 
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Definition: Relative entropy, conditional relative entropy. For probability 
measures r and s defined on a finite set Z, we denote by D(r||s) the relative entropy 
of r with respect to s: 

, r(u) 

s) = r{u) log — 

s(u) 

with the convention OlogO = 0 and alogO = cxd for a > 0. li Z and U are random 
variables with values in Z and distributed according to r = C{Z) resp. s = C{U), 
then we shall also use the notation D{Z\\U) for the relative entropy D(r||s). If, 
moreover, we are given a probability measure tt = C{S) on another finite set S, 
and conditional distributions n{-\s) = C{Z\S = s), i^{-\s) = C{U\S = s) then we 
consider the average relative entropy 

E^D(^(-|5)||z/(-|^)) = ^7r(s)D(^(-|s)||z/(-|s)). 

sES 


For E,r-D(A^(-|5') ||^^(■|5')) we shall use either of the notations 
I)(;k(.|S)||K-|S)), D(M-|S)l|f/|S). I)(Z|S)||^(.|S)), D(Z|S)||f/|S)) 


(omitting the symbol of expectation as is usual in information theory). 


Notation. 

For = (yi, 1/2, • • •, Vn) ^ and / C [1, n], we write 

yi = iVk ■ k e I) and yi ^ {yk ■ k ^ I). 

Moreover, if p'^ = C{Y‘^) then 

p,^C(Y,), p,(-\v,)^C(Y,\Y, = Si), Pi^C{Y,), p,{-\v,)^C(Y,\Y,^y,). 
If / = {i} then we write i instead of {i}. 


Definition. The conditional distributions qi{-\xi) are called the local specifications 
of the distribution q^. 


Theorem 1. 

Set 

a = mmqi{xi\xi), ( 1 . 1 ) 

where the minimum is taken over all x'^ G T"" satisfying q{x‘^) > 0 and all i G [1, n]. 
Fix a p'^ = £(y"') G P(T"') satisfying 


q^ix^) = 0 


p^{x^) = 0 . 


( 1 . 2 ) 
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Assume that q'^ G V{X'^) satisfies all the inequalities 


^iei 


yi = yi\. 


(1.3) 


where I C [I,?!-] and yi G is a fixed sequence. Then 

D(p"||4") < ^ . ^E|pi(.|K.) - ft(-|Vi)l" 

i=l 

on ' 

< — 4.4) 

i=l 


(Condition (1.2) is necessary, since otherwise D{p'^\\q^) conld be oo, while the 
middle term is always hnite. On the other hand, for the ineqnality between the 
hrst and last terms it is not necessary to assnme (1.2), since if D{p'^\\q^) = oo then 
the last term is cxd as well.) 

Remark. In [M] a bonnd, analogons to the one relating the hrst and last terms of 
(1.4), was proved for measnres on Enclidean spaces. (Under reasonable conditions.) 
That bonnd was nsed to derive a logarithmic Sobolev ineqnality, improving on an 
earlier resnlt in [0-R]. In the present paper a logarithmic Sobolev ineqnality shall 
be dednced from the hrst ineqnality in (1.4) (Corollary 2 to Theorem 1). 

Theorem 1 implies that the Gibbs sampler (or Glanber dynamics) dehned by the 
local specihcations of q^ is a strict contraction for relative entropy. 

Definition: Gibbs sampler. 

For i G [1, n] let T^ : V{X^) i—)■ V{X'^) be the Markov kernel 

= Sim, m) ■ q^iz^\y^), y^, G TT 

(I.e., Ti leaves all, bnt the i-th, coordinates nnchanged, and npdates the i-th coor¬ 
dinate according to qiiyi\yi).) Finally, set 



i=l 


I.e., r selects an i G [l,n] at random, and applies T^. It is easy to see that T 
preserves, and is reversible with respect to, q'^. T is called the Gibbs sampler 
governed by the local specifications of q^. 
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Corollary 1 to Theorem 1. 

If on T"' satisfies the conditions of Theorem 1 then 

Dfp^VWq^) < 


a ^ 

' 


D{p^\\q^). 


(1.5) 


(1.5) follows from Theorem 1 by the inequality 

o(p"r||?")<-YB(p"r.||<,") 

n 

i=\ 


(a consequence of the convexity of relative entropy), together with the identity 


i)(p"||9")-B(p"ri||4") = i)(pi(-|yi)||5.(-|yi)). 


Theorem 1 also implies Gross’ logarithmic Sobolev inequality defined as follows: 

Definition: logarithmic Sobolev ineqnality for a Markov kernel. 

Let (^, tt) be a finite probability space, and G : Z Z a, Markov kernel with 
invariant measure tt. The Dirichlet form associated with G is 

We say that G satisfies a logarithmic Sobolev inequality with logarithmic Sobolev 
constant c if: for every probability measure p on ^ we have 

D{p\\tt) <C■SG{^/f, ^/f), 


where f{z) = p{z)/ti{z). 


The property expressed by the logarithmic Sobolev inequality was defined by L. 
Gross [Gr] in 1975. For an introduction to logarithmic Sobolev inequalities and 
their manifold interpretations and uses, c.f. [L] and [R]. 

Theorem 1 implies Gross’ logarithmic Sobolev inequality for the Gibbs sampler 
r. A simple calculation shows that 



(Using the fact that, for fixed yi, the measure p^Vi does not depend on yi, we just 
calculate the Dirichlet form for a matrix with identical rows.) 
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Corollary 2 to Theorem 1. 

If on T"' satisfies the conditions of Theorem 1 then 

i. fl(p"r||,") < ^ ■ Eefi - (5: fiAvm ■ qi(yi\Y.)) ) 

i=l ^ \ieX ^ '' 

= — .g ( R 
^ ^ V V <?"■ V ) 

This can he considered a dimension free logarithmic Sobolev inequality, since F only 
updates one coordinate. 


Corollary 2 follows from Theorem 1 by the following 

Lemma 1. (The proof is in Appendix A) 

Let r and s he two probability measures on X. Then 

|r-s|2 < 1 - (^ ^r{y)s{y)f. 
yex 


Theorem 1 can be applied to distribntions q^ satisfying the following version of 
Dobrnshin’s nniqneness condition: 

Definition: Dobrnshin’s nniqneness condition. 

We say that q^ satisfies (an L 2 -version of) Dobrnshin’s nniqneness condition with 
conpling matrix 

A = {ak,i) 

if: for any pair of integers k,i E [l,n],k i and any two seqnences z'^,s^ G 
differing only in the /c’th coordinate, 

\Qi{-\zi) - Qii-\si)\ < ak,i, ( 1 . 6 ) 


and, setting = 0 for all i, 


11^112 < 1 . 


This differs from Dobrnshin’s original nniqneness condition where the norm | |A| |i 
is assnmed to be < 1. 
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Theorem 2. 

Assume that the measure on T"" satisfies Dobrushin’s uniqueness condition with 
coupling matrix A, ||A ||2 < 1. Then the conditions of Theorem 1 are satisfied with 
C = l/{l — ||A||)^. Thus for any p^ G satisfying (1.3): 


^' TT-T^ ■ I:e|k('R) - ftOK 

i=l 




2 

< - ■ 


“ (i-ipii)' 




i=l 


(1,7) 


and 


D(p"r||,") < (^1 - i. I. (1 - ii^ii)"^ . i)(p"H,"). (1,8) 


Remark. In [Z] a logarithmic Sobolev inequality is proved for discrete spin systems, 
where the title suggests that it uses Dobrushin’s uniqueness condition. However, 
the condition used there is reminiscent but not identical to Dobrushin’s uniqueness 
condition. Moreover, an inequality of the form relating the first and last terms of 
(1.4) has been recently proved in [C-M-T], assuming conditions slightly reminiscent 
of Dobrushin’s uniqueness condition. 

Theorem 1 is proved in Section 2, and Theorem 2 in Section 3. 

In Section 4 we are going to deduce a logarithmic Sobolev inequality from a strong 
mixing condition, for measures q on . (Under the additional condition that the 
local specihcations qk{xk\xi^ t 7 ^ /c), if not equal to 0, are bounded from below.) The 
strong mixing condition we use is the same as Dobrushin and Shlosman’s strong 
mixing conditions, but we do not assume that g is a Markov held. Our strong mixing 
condition can also be considered as a generalization of $-mixing for (stationary) 
probability measures on For non-Markov stationary probability measures on 
it is more restrictive than usual strong mixing. 

The hrst proof for the implication that Dobrushin and Shlosman’s strong mixing 
conditions imply a logarithmic Sobolev inequality for Markov helds was given by D. 
Stroock and B. Zegarlinski [S-Zl], [S-Z2] in 1992 (where the authors also proved the 
converse implication, i.e. that Dobrushin and Shlosman’s strong mixing conditions 
for Markov helds are equivalent to the logarithmic Sobolev inequality). The argu¬ 
ments in [S-Z2] are quite hard to follow. In 2001, F. Cesi proved that Dobrushin 
and Shlosman’s strong mixing conditions imply a logarithmic Sobolev inequality; 
his approach is quite different from the previous ones, and much simpler. 

We feel that there is still room for alternative and perhaps simpler proofs in this 
important topic. Moreover, our proof is valid without the Markovity assumption. 
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(It may be, though, that the proofs in [S-Z2] and [C] can also be generalized for 
the non-Markovian case, just it has not been tried.) 

We believe that the separate parts of our proof (Theorem 1 and the applicability 
of Theorem 1) are comprehensible in themselves, thus making the whole proof easier 
to follow. 


2. Proof of Theorem 1. 

We need the following 

Lemma 2. 

Let r and s be two probability measures on X. Set 

etc = min s(a;). 

s{x)^0 ^ ^ 


IfDirWs) < oo then 


D{r\\s) < 


4 

eXs 


r — s 


2 


( 2 . 1 ) 


Remark. 

Inequality (2.1) can be considered as a converse to the Pinsker-Csiszar-Kullback 
inequality which says that 

|r — sp < -D{r\\s). 

However, there is no uniform converse: the reverse inequality must depend on s. 


Proof. 

Set T_|_ — {x E X : s{x) > 0}. The following inequality is well known: 




V4 


s(a:) 


It follows that 


D{r\\s) < 


1 

CXg 


^|r(a:) 

x+ 


s(a:)|2 < —(5^ |r(a:) -s(a:)|)^ 
a: 



r — s 


2 


We proceed to the proof of Theorem 1. Let tt — C{Y^,X‘^) be a coupling of 
pU ^ and = C{X^) that achieves W 2 (p^, q^). 

We apply induction on n. Assume that the theorem holds for n — 1. 
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By the expansion formnla for relative entropy we have 

^ n ^ n 

D{p^\\q^) = - -y^DiY^WX,) + - ■yD{%\Y,U\yi)- (2.2) 

i=l i=l 

For each hxed yi, the measnre qi{-\yi) satishes the conditions of the theorem. By 
the indnction hypothesis, 

-■J2D{Y,\Y,mAY,)) 

i=l 

i=l j^i 

^-^■^■Y\pMY-QMYi)t (2.3) 

ft/ Cl 

/ j=l 

To estimate the hrst term in the right-hand-side of (2.2) , observe that the 
dehnition of a implies that for any i G [l,n] and x E X, Pr{Xi = x} > a. Thns 
by Lemma 2 we have 



D{Y,\\X,) <-■ C{Yi)-CiX,) 


a 


Farther, condition (1.3) implies 


5^|£(yi) - ax,)\^ <Y,Prl{Y. + XA = ir|(p",9") 

i=l i=l 

n 

<C.E^|pi(.|y.)-5.(.|y)|t 

i=l 


(2.4) 


(2.5) 


Pntting together (2.4) and (2.5), it follows that the hrst term on the right-hand-side 
of (2.2) can be bonnded as follows: 


1 

n 


■ XlB(r.llv)<l.^. j^E|pi(.|y) 


9i('|y) 


( 2 . 6 ) 


Snbstitnting (2.3) and (2.6) into (2.2) we get the hrst ineqnality in (1.4). The second 
ineqnality follows from the Pinsker-Csiszar-Knllback ineqnality. □ 
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3. Proof of Propostion 3. 

Let both and be fixed. We want to show that (1.3) holds with C = 
l/(l — ||A||) , where A is the conpling matrix for q^. It is enongh to prove this 
for I = [l,n], since for any / C [l,n] and yi the conditional distribntion qi{-\yi) 
satisfies Dobrnshin’s nniqneness condition with a minor of A as a conpling matrix. 
(The idea of the proof for / = [l,n] goes back to Dobrnshin’s papers [Dl], [D2], 
althongh he worked with another matrix norm.) 

We are going to prove that Dobrnshin’s nniqneness condition implies that the 
Gibbs sampler T is a contraction with respect to the lL 2 -distance with rate 1 — 1/n- 

(i-II^ID- 

To achieve this, let and be two probability measnres on and let and 
be random seqnences representing and respectively. (I.e., = C{U‘^), 

= C{Z^).) 

Select an index f G [l,n] at random, and define 

Uk = Uk, Zk = Zk for k^i. 

Then define £{11/^ Z/\Ui — Ui, Zi — Zij as that conpling of qi(-jui) and qi{-\zi) that 
achieves \qi{-\ui) — qi{-\zi)\. It is clear that C{U'^) = r^'T, and C{Z'^) = s^'T. 

By the definition of the conpling matrix we have 

Pr{G/ ^ Z/} < (1 - 1/n) ■ Pr{U, ^ Z,} + 1/n ■ ■ Pr{Uk ^ Zk}. 

k^i 


It follows that 


n 



Pr^U/ ^ Z/} < 



n 






where 


Thns 


B — {1 — 1/n) ■ In + 1/n ■ A. 




5^Pr2{K.'#Z.'} 

i=l 


(i-i.(i-iwi) 



Y,Pr^U,^Z,}. 

i=l 


This proves the contractivity of T with rate 1 — 1/n - (1 — ||A|| 2 ). 


By the triangle ineqnality 


Vk2(p^?") < IT2(p^p^^) + W2(p^r,Q-). 
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By contractivity of F, and since g” is invariant with respect to F, it follows that 
vv2(p", 9’*) < ir 2 (p",p"r) + (1 -i/n.(1 - ii^iD) • ir2(p",9"). 

i.e., 

71 

W2{p\qn ^ r3|^-^2(p^p"F). 

Bnt it is easy to see that 

VF2(p",p"F) = i ■ 

n 

By the last two ineqnalities, (1.3) (for / = [l,n]), and hence Theorem 2, is proved. 
□ 

4. Gibbs measures with the strong mixing property. 

4.1. Definitions, notation and statement of Theorem 3. 

In this section we work with measnres on where A is a snbset of the d- 
dimensional cnbic lattice Most of the time A shall be hnite. 

The lattice points in shall be called sites. The distance p on 1/ is 

p{k,i) = - k^)\, where /c = (/ci,/c 2 ,...,/cd), f = (n, * 2 ,..., *d). 

V 

The notation K (Z^ZTJ^ expresses that A is a hnite snbset of Z'^. 

The elements of X are called spins, and the elements of the set X^ (A C Z'^, 
possibly inhnite) are called spin conhgnrations, or jnst conhgnrations, over A. 

We consider an ensemble of conditional distribntions where A CC Z'^, 

and A is the complement of A. We prefer to write xa in place of x^, and, accordingly, 
gA(-|^A) in place of gA(-|iCA)- The measnre gA(-|^A) is considered as the conditional 
distribntion of a random spin conhgnrations over A, given the spin conhgnration 
ontside of A. For a site f G Z*^ we nse the notation qi{-\xi). 

The conditional distribntion gA(-|^A) (A CC Z*^, x\ G X^) natnrally dehnes the 
conditional distribntions gM(‘|^M) for any M C A. We assnme that the conditional 
distribntions gA(‘|^A) satisfy the natnral compatibility conditions. The conditional 
distribntion gA(-|^A) also dehnes, for M C A, the conditional distribntion gM(-|^A). 

If the compatibility conditions hold then there exists at least one probability 
measnre q = C{X) on the space of conhgnrations X'^ , compatible with the condi¬ 
tional distribntions gA(-|^A): 

C{Xa\Xa = xa) = gA(-|^A). 
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Here X\ denotes the marginal of the random conhgnration X for the sites in A, and 
XX is called an ontside conhgnration for A. The conditional distribntions (Ja(-|ta) 
are called the local specihcations of q, and q is called a Gibbs measnre compatible 
with the local specihcations (Ja(-|ta)- 

We say that the ensemble of local specihcations (Ja(-|^a) has hnite range of 
interaction R (or is Markov of order R) if (Ja(-|ta) only depends on those coordinates 
Xk [k G A) that are in the i?-neighbor hood of A. 

In general, the local specihcations do not nniqnely determine the Gibbs measnre. 
The qnestion of nniqneness has been extensively stndied in the case of local spec¬ 
ihcations with hnite range of interaction, and a snfhcient condition for nniqneness 
was given by R. Dobrnshin and S. Shlosman [D-Shl], Bnt the general qnestion of 
nniqneness is open, even for measnres with hnite range of interaction. 

A property stronger than nniqneness is strong mixing. 

In their celebrated paper [D-Sh2] in 1987, R. Dobrnshin and S. Shlosman gave a 
characterization of complete analyticity of Markov Gibbs measnres over Their 
characterization was formnlated in twelve conditions which were proved to be eqniv- 
alent, and are referred to as Dobrnshin and Shlosman’s strong mixing conditions. 
The following dehnition is the same as one of these twelve (III G), except that we 
do not assnme Markovity, and replace the f un ction K ■ exp(—yr) by a more general 
fnnction <f{r). In the Markov case <f{r) necessarily shall have the form K ■exp(—yr). 

In order to dehne strong mixing, let <f> : Z_|_ i-A M-(- be a fnnction satisfying 

^ (p(p(0,f)) < cx). (4.1.1) 


Definition: Strong mixing. The Gibbs measnre q is called strongly mixing with 
conpling fnnction ip if for any sets M C A CC Z^ and any two ontside conhgnrations 
y\ and zx dihering only at one single site k ^ A: 

\qM{-\yA) - qM{-\yA)\ <p{p{k,M)). (4.1.2) 


For stationary probability measnres on this definition is more restrictive 
than nsnal strong mixing, and is eqnivalent to $-mixing. On Z^ the term strong 
mixing has been only nsed for Markov fields, and for simplicity we extend its nse 
withont adding any qnalification. 


Onr aim in this section is to prove the following 
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Theorem 3. 

Assume that the ensemble (Ja(-|ta) satisfies the strong mixing condition with cou¬ 
pling function ip. Moreover, assume that 

a = inf qfixfxi) > 0, 

where the infimum is taken for all x G and i & such that qi{xi\xi) > 0. 
Then, for fixed A CC and outside configuration y\, the conditional distribution 
QAi-\yA), o-s a measure on , satisfies condition (1.3) of Theorem 1, with a con¬ 
stant C, independent of A and y\. Moreover, it is enough to assume (4.1.2) for sets 
A of diameter at most mo, where mo depends on the dimension d and the function 
ip. The constant C depends on the dimension d, the function ip and on a. 

Remark. If q has finite range of interaction then Theorem 3 implies that condition 
(4.1.2) is constrnctive, in the sense of Dobrnshin and Shlosman. 

There is another approach to strong mixing, for measnres q on X^ with finite 
range of interaction. This approach was developed by E. Olivieri, P. Picco and F. 
Martinelli; c.f. [M-01]. Their aim was to replace the above condition of strong 
mixing ((4.1.2)) by a milder one, reqniring (4.1.2) only for ” non-pathological” sets 
A, i.e. for sets whose bonndary is mnch smaller then their volnme. Martinelli 
and Olivieri [M-02] proved a logarithmic Sobolev ineqnality nnder this modified 
condition, for measnres q with finite range of interaction. In Appendix B we briefly 
sketch the Olivieri-Picco-Martinelli approach, and how to modify Theorem 1 and 
the Anxiliary Theorem (below), to get logarithmic Sobolev ineqnalities nnder this 
weaker assnmption. 

4.2. Proof of Theorem 3 

Consider the infinite symmetric matrix 

V / kpeid- 

Since the entries are non-negative, and the row-s um s eqnal, ||$|| eqnals the row- 
snm: 

ll^ll = 

ieT.d' 

Fix a A CC Z'^, an ontside confignration y\ and a p\ G V{X^). It is enongh to 
prove that 

W^{p^,qA{■\yA))<C■EJ2w^{p^i■\Y^),Q^i■\Y^)), (4-2.1) 

ieA 

(with C independent of A and ^a), since for any M C A and any fixed y\\Mj 
the conditional distribntion qM{-\yM) (where yM = (i/axm^^a)) satisfies the strong 
mixing condition with the same fnnction (p. 

We start with a weaker version of (4.2.1). 
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Notation. 

Let Xm = Xm.(A) denote the set of m-sided cubes in l/ that intersect A. Set 


Qrn = min 
R 


l$l 


d ■ R 


m 


+ 2(i ■ ^(2r + 1)*^ ^(p{r) 


r=R 


(4.2.2) 


Note that we can achieve 

Om < 1, (4.2.3) 

by selecting R large enough to make the second term in (4.2.2) small, and then 
selecting m. 

Auxiliary Theorem. If m is so large that 0^, < 1 then 
Wi{pA,qA{-\yA)) 

- ZXd ■ /I _ c) ',2 ■ X] ^W2{pinA{-\YinA),qinA{-\YinA)) 

^ 12 • E Eb/nA(-|17nA) - ?7nA(-|i7nA)|'- (4.2.4) 

[ ~m) 

If the ensemble iJa(‘|^a) has finite range of interaction R then the Auxiliary Theorem 
holds with ||$|| • in place of Orn- 

The second inequality in (4.2.4) follows from the hrst one by the trivial inequality 
lT|(r^,s^) <n-for r^,s^eV{X^). 


The proof of the Auxiliary Theorem follows that of Theorem 2, but we use a 
more general Gibbs sampler, updating (the intersection of A with) an m-sided cube 
at a time, not just one site. Let us extend the dehnition of pa so that on A it be 
concentrated on the hxed i/a. 


Definition. 

For / G Xm let T/ : V{X^) i-A be the Markov kernel: 

r7(^A|yA) = ^yA\/,2A\/ ■ '?7nA(^7nA|y7nA)- 
(For k E A, yk is dehned by the hxed ^a)- Then set 


Ft = 


\X„ 


Er.. 

ICilm 
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Then preserves, and is reversible with respect to, qA{-\yA)- We call Tx^ the 
Gibbs sampler for measnre qA{-\yA), dehned by the local specihcations qinA{-\yinA), 


Proof of the Auxiliary Theorem. 

To estimate bT|(pA, q'A('lyA)), we are going to prove that if (4.2.3) holds then the 
Gibbs sampler Tx^ is a contraction with respect to the W 2 -distance, with rate 
1 - m’^/\Ira\ ■ (1 - 0m)- 

To achieve this, let r and s be two probability measnres on and let Y and Z 
be random variables representing r and s, respectively. (I.e., r = T(T), s = T(Z).) 
Let the conpling £(T, Z) of r and s achieve W 2 (r, s). We extend the dehnition of 
£(V, Z), letting Ya = Za = i/a, where |/a is the hxed ontside conhgnration. Let Y' 
and Z' be random variables representing rTx^ and sTx^. 


Snppose that, when carrying ont one step in the Gibbs sampler Tx^, we have 
selected a certain I G X^. Then we can assnme that 


Moreover, 

and 


Y/ = Y, and Zf = Zi for all ieA\I. 

^{YinA I bA\7 = yA\i) = qinA{-\yinA), 
^{Zir\A I Za\i = ^A\j) = '?JnA(-|^7nA)- 


At this point we need the following 
Lemma 3. (The proof is in Appendix A.) 

Let us fix the set M CC TA, together with two outside configurations ijM ctnd zm, 
differing only at site k ^ M. Let Y and Z be random variables realizing qM{'\yM) 
andqM{-\zM)- Define 

Ji = = {j e M : p{kj) > p(/c,i)}. (4.2.5) 

Then there exists a coupling tt = X(T, Z\yM-, zm) of C{Y) and C{Z), satisfying 
Pr^{Yi Zi} = \qjA-\yM) - qjA-\^M)\: i e M. 

If q satisfies the strong mixing condition with function (p then, for this coupling, 
PrTt{Yi 7 ^ Zi} < (p}p{k,i)) for all i G M. 


By Lemma 3, for hxed I, ymA and zmA, we can dehne a conpling 

T/nA {'lyinA, zida) 

= -^(WnA , Zjf^A I WnA = y/nA, -X/nA == ^/ha), 
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satisfying 


| ^InA — I/JnAj-^InA — ^IHa) 

- X] ^{yk,Zk) ■ ^p[p{k,i)), for all i G / fl A. 
keA\i 


Thus 


Pr[Y/ 7 ^ Z/ I I selected} 

< Pr{Yk ^ Zk} ■ ^p{p{k,i)) for all i E I f] A. (4.2.6) 

keA\i 

We calculate Pr{Yi' ^ Z/} by averaging for I E Im- Set N = \Xm\- Since each 
t G A is covered by exactly cubes from X^., ( 4.2.6) implies 

Pr{y/ ^ X/} 

< (l-^)-Pr{Fj#Z.} + 4.5^ Pr{Yt^Zt}-^(,,{k,i)). 

iBk keA\i (4.2.7) 


Consider the vectors 


u = (pr{Yk ^ Zk}) and v = (pr{Y/ ^ Z/} 

\ / keA V / ieA 

and let D denote the matrix with entries 


dk,i ^ (p{pikA)) ■ k.ieA. 

lBi,A\lBk 


With this notation, (4.2.7) means that 

.<P-YyuYYo,.u 


coordinatewise, thus 


£((i-^) + IT 


d 


u . 


(4.2.8) 


1 < min{(i ■ ^■p(/c,t), m'^}. 

I:k^I,lBi 


We claim that 
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Indeed, there are d lattice-hyperplanes separating k and t, and there is exactly one 
among these that intersects the line segment (in M'^) connecting k and i. These 
facts imply that an m-sided cube can be placed in at most d ■ ■ p{k,i) ways 

so as to satisfy both conditions k ^ I and I 3 i. It follows that 

dk,i < m'^ ■ ip{p{k,i)) ■ minj^^—l|. (4.2.9) 

Since the right-hand-side of (4.2.9) is symmetric in k and t, we have 


\D\\ < 


E 


<p{p{k,i)) ■ 


mm 


d ■ p{kj i) 


m 


(4.2.10) 


Now £x an R, and divide the sum in (4.2.10) into two parts, for i satisfying p{k, i) < 
R and {p{k, i) > R, respectively. We see that 

^ p{k,i)>R ' 


Taking minimum in i?, we get 

||D|| < ■ 0^. (4.2.11) 

By (4.2.8) and the dehnition of the vectors u and n, (4.2.11) implies that 

Wpryy.vz/} < (i- ^ ■ (1 -e„)) . 

V ieA ^ ^ y fceA 

i.e., 

( TTl^ \ 

1- —-(l-ejj ■»r 2 (r.s). (4.2.12) 

The stated contractivity is proved. 

By the triangle inequality it follows that 

W2{pA,qA{-\yA)) < W2{pA:PATXr^) + 11^2 (paBi^ Wa(-I^a)) 

( 777/^ \ 

1 - ^ ■ (1 - 0m) ] ■ W2{pA,qA{-\yA)), 

whence 

W2{pA,qA{-\yA)) • W2{pA,PA^Xrr,)- (4.2.13) 

HI (,1 ^m) 



18 


LOGARITHMIC SOBOLEV INEQUALITY 


To complete the proof if the Auxiliary Theorem, we have to estimate W 2 (pa, ) 

in terms of the quantities 

(p/nA('|^/nA), Q/nA('|^/nA)) ■ 

To do this, hx an / G X^., together with a sequence yA\i G and dehne a 

coupling 7r7nA(-|i/A\7) of pjnA(-|yjnA) and qinA(-jyinA) that achieves tT 2 -distance. 
We extend 7rjnA(-|i/A\j) to a measure on x A^ concentrated on the diagonal 
{yA\iTyA\i)j for coordinates outside of I. Finally, we dehne the coupling tt of pA 
and pA^Xm by averaging the distributions 7r/nA(-|i/A\j) with respect to / and yA\i- 

Using this construction, an easy computation (using the Cauchy-Schwarz in¬ 
equality) shows that 


Wi{pA:PATx^) < 


^ EW|(p7nA(-|WnA),Q/nA(-|i7nA)). 


(4.2.14) 


Substituting (4.2.14) into (4.2.13), we get the hrst inequality in (4.2.4). Under¬ 
standing the proof one easily sees that the statement for Gibbs measures with hnite 
range of interaction holds true. The Auxiliary Theorem is proved. □ 

To complete the proof of Theorem 3 we have to deduce (4.2.1) from the Auxiliary 
Theorem. To do this we need the following 

Lemma 4. (The proof is in Appendix A.) 

Let p^ = £(y"') and he two measures on A"'. Let a he defined hy (LI). Then 

/ X n+logjn ^ 


Using Lemma 4, we estimate the terms in the last sum in (4.2.5). We get 

^l{vA,qA{-\yA)) 

2 


m+log 2 ra 


< 


m 


(l-0^)2 \^(|A|-a)2 

Thus (4.2.1) is fulhlled with 


EX]|Pi(-|bAv) - x(-|bA\i,i/A)| ■ 

ieA 


m-\-\og2 m 


c = 


m 


{ l-QmY l (|^|-«)2 
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as soon as m is large enough for 0 ^ < 1 . 

We used the strong mixing condition (4.1.2) in proving Lemma 3, and Lemma 3 
was used for subsets of m-sided cubes. It was enough to consider m-sided cubes with 
m so large that 0^ < 1 holds, a condition depending on d and (p. This proves the 
last two statements of Theorem 3. □ 

Remark. An argument similar to the use of Lemma 4 was also there in [S-Z2]. 
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Appendix A 

Proof of Lemma 1. (This proof was suggested to the author by M. Raginsky [R].) 


We use the notions of Hellinger distance and Hellinger affinity: 

| 2 \ 1/2 


if(r, s)=[Y, 




\/ r{x) — a/ s(a:) 


I and A(r, s) = a/ r{x) ■ s(a:). 

' xex 


The statement of the lemma can be formulated as 

|r — < 1 — A^(r, s). 


(A.l) 


To prove (A.l), we use the identity 

iL^(r, s) = 2(1 — A(r, s)). 

(A.l) is now proved by the following chain of equalities and inequalities: 


r — s = 


^xex 

- 4 

xEX 


xEX 

a/ r{x) — a/ s(a:) 


a/ r{x) + a/ s(a:) 


a/ r{x) — a/ s(a:) 


E 

xEX 


a/ r{x) + a/ s(a:) 


= i^^(r, s) ■ 2(1 + A(r, s)) 

= (1 — A(r, s)) ■ (1 + A(r, s)) = 1 — A^(r, s). 


2 
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(The inequality follows from the Cauchy-Schwarz inequality.) 

Proof of Lemma 3. 

Order the elements of A so that 


p(/c, ii) < p{k, h) <■■■ < p{k, t|A|), 

i.e., the sequence of sets Ji = Jk,M,i (c.f. (4.1.2)) is decreasing in i. Let Yj. 

and Zj denote the marginals of Y and Z, respectively, for the sites in Ji. Then 

and (Zj,. Zjj, • • •,) are Markov chains. (In fact, 
is a f un ction of Tj..) Therefore, by a theorem of Goldstein [Go], there exists a 
coupling TT = £(y, Zl^M, ^m) of C{Y) and C{Z)^ satisfying 

PrAYj, # Zj.} = \C(Yj,) - HZj,)\ = |fe(-|5„) - 
Since i E Ji, and p{k, i) = p{k, Ji), the statement of Lemma 4 follows. 

Proof of Lemma 4 ■ 

Note hrst that if r and s are probability measures on X, and r{x), s{x) > a then 


r — s\ < 1 — \X\ ■ a. 


Now consider measures = T(Yi, Z 2 ) and on a product space y x Z, where 
12 (^ 212 / 1 ) > 02 , and qi{yi\z 2 ) > cti for all yi, Z2ey x Z. Then 

|?2(-|pi) - q2{-\yi)\ < 1- jzj ■ 02, and \qi{-\z2) - i?2(-|^2')l < 1 - | 3 ^| • «! 
for all yi,yi G y and Z2, Z2 G Z. 

Thus in this case Dobrushin’s uniqueness condition is satished with a 2 x 2 
coupling matrix, with entries 1 — lA’I ■ cti and \ — \Z\ - 02 outside the diagonal. (It 
does not matter that y and Z may be different.) The coupling matrix has norm 

< max{l — |(y| ■ ai, 1 — \Z\ ■ a 2 }. 


By the argument proving Theorem 2, it follows that 


W2{p‘^,q^) 

- ^ 

and, consequently, 

12 2|2 

\p -q \ 

- ^ 


92('|yi) 





PROOF BY A TRANSPORTATION COST DISTANCE 


21 


Lemma 4 follows from (Al) by a recursive argument, dividing the index set into 
two possibly equal parts of size \^~\ and and applying (Al) for the two parts. 
Then 

J 2 2 


max 


shall be replaced by 


(1^1 •a2)' 


Itl 


Repeating this step about log 2 n times we get the statement of the lemma. □ 


Appendix B 

Let Z'^// (/ > 1 integer) denote the sub-lattice in Z'^, consisting of points whose 
coordinates are all multiples of /, and let Ci denote the set of hnite unions of /-sided 
cubes with vertices in jl. 

The approach by Olivieri and Picco is based on the following dehnition of strong 
mixing: 


Definition by Olivieri and Picco. 

The Gibbs measure q on with hnite range of interaction is called strongly 
mixing over 0 , if there exist numbers 7 > 0 , AT > 0 such that: for any sets A G G, 
M C A and any two outside conhgurations y\ and z\ differing only at a single site 
/c ^ A, we have 


?M(-|yA) - ?m(-|^a)| < K ■ exp (-7 ■ p{k,M)). (B.l) 


In force of the following theorem, if I is sufficiently large then it is enough to 
require (B.l) just for cubes in G, to get (B.l) for all A G G, however, with a 
different 7 and K. 

Olivieri and Picco’s Effectivity Theorem, [O-P], [M-Ol]. 

Assume that the Gibbs measure q on has finite range of interaction. For any 
7 , AT > 0 there exists an Iq such that: if for some I > Iq (B.l) holds for all l-sided 
cubes A E Cl, all M C A and all k ^ A, then (B.l) also holds for all A E Ci, and 
M and k as above, with different 7 and K. 

We use a slightly more general dehnition, although we cannot justify it with 
some analog of the above Ehectivity Theorem: 
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Definition: Strong mixing over Ci. 

Let (f : Z_|_ I— )■ be a function satisfying (4.1.1). Fix an integer I > 1. The 

ensemble of conditional distributions i 3 'a(-|ta) on is called strongly mixing 
over Cl, with coupling function ip, if for any sets A E Ci, M C A, and any two 
outside configurations y\ and z\ differing only at the single site k, (4.1.2) holds. 
(We do not assume finite range of interaction.) 

For measures strongly mixing over Ci one can prove a logarithmic Sobolev in¬ 
equality by means of the following modifications of Theorem 1 and the Auxiliary 
Theorem: 

Theorem 1’. 

Consider a measure on = 11^=1 ’ where 

A = U^=iAj, Aj n Afc = 0 for j 7 ^ k, \Aj\ = m. 

Set 

a = mm{qi{xi\xi) : qa(ta) > 0 , i G A}. 

Fix a p\ = T(Fa) on X^ satisfying 

QaIxa) = 0 ^ Pa(xa) = 0 . 

Assume that q\ satisfies all the inequalities 

ir|(p;(-IS;).9;(-IS;)) <C.e| ^ 

'<Aj(Zl 

where I C A is the union of some of the sets Aj, and yj E X^'^^ is a fixed sequence. 
Then 

ACm JA, 

0(paI|9a) < — •5:EMAypA,(.|f-A,),5A,(TA,))- 

i=i 



This can be proved by the same argument as Theorem 1, using Lemma 1, the 
inequalities 


PA^'IV) 


9a7'IV) 


2 

<m.WypAj('|f'A,).9A,('l?A,)). 


and, in each induction step, fixing a whole new block Ta^ . 
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Auxiliary Theorem for measures strongly mixing over Ci. 

Fix an integer I, and assume that the ensemble of conditional distributions (Ja(-|ta) 
on satisfies the strong mixing condition overCi, with coupling function ip. Let 
A E Cl, and fix an outside configuration y\. For fixed m let Xmi denote the set of 
m ■ l-sided cubes from Ci intersecting A. Then for large enough m and any measure 
Pa on X^ 


Wi{pA,qA{-\yA)) 

<C- EVr|(P/nA(-|i7nA),?7nA(-|i7nA)) 

leXmi 

Y, E|p/nA(-|i7nA)-?/nA(-|nnA)|", 

where C and m depend on the dimension d and on the function p. 

The proof uses a Gibbs sampler, updating (intersections with A of) randomly 

chosen cubes of side m ■ I from Ci. (For an appropriate m.) 
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